import pandas as pd
1 Goal
Today my goal is to write a python program that selects books that I haven’t read from my Goodreads list, at random with filtering options. It should be possible to choose publishing date, max amount of pages, min rating, date added and maybe others.
= pd.read_csv('data/day24/goodreads_library_export.csv') df
5) df.head(
Book Id | Title | Author | Author l-f | Additional Authors | ISBN | ISBN13 | My Rating | Average Rating | Publisher | ... | Date Read | Date Added | Bookshelves | Bookshelves with positions | Exclusive Shelf | My Review | Spoiler | Private Notes | Read Count | Owned Copies | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 51648276 | Drive Your Plow Over the Bones of the Dead | Olga Tokarczuk | Tokarczuk, Olga | Antonia Lloyd-Jones, Beata Poźniak | ="" | ="" | 0 | 3.94 | Penguin Audio | ... | NaN | 2023/11/08 | NaN | NaN | read | NaN | NaN | NaN | 1 | 0 |
1 | 18112493 | Parissyndromet | Heidi Furre | Furre, Heidi | NaN | ="8282880035" | ="9788282880039" | 0 | 4.12 | Flamme | ... | NaN | 2024/12/21 | NaN | NaN | read | NaN | NaN | NaN | 1 | 0 |
2 | 25489025 | The Vegetarian | Han Kang | Kang, Han | Deborah Smith | ="0553448188" | ="9780553448184" | 0 | 3.64 | Hogarth | ... | NaN | 2024/12/21 | NaN | NaN | read | NaN | NaN | NaN | 1 | 0 |
3 | 28921 | The Remains of the Day | Kazuo Ishiguro | Ishiguro, Kazuo | NaN | ="" | ="" | 0 | 4.14 | Faber & Faber | ... | NaN | 2025/07/15 | NaN | NaN | read | NaN | NaN | NaN | 1 | 0 |
4 | 43868109 | Empire of Pain: The Secret History of the Sack... | Patrick Radden Keefe | Keefe, Patrick Radden | NaN | ="0385545681" | ="9780385545686" | 0 | 4.54 | Doubleday | ... | NaN | 2025/07/10 | to-read | to-read (#298) | to-read | NaN | NaN | NaN | 0 | 0 |
5 rows × 24 columns
2 Data Cleaning
First, I need to clean the data a little and remove unwanted columns and rows
# Remove read books
= df[df['Read Count'] == 0] to_read
to_read
Book Id | Title | Author | Author l-f | Additional Authors | ISBN | ISBN13 | My Rating | Average Rating | Publisher | ... | Date Read | Date Added | Bookshelves | Bookshelves with positions | Exclusive Shelf | My Review | Spoiler | Private Notes | Read Count | Owned Copies | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | 43868109 | Empire of Pain: The Secret History of the Sack... | Patrick Radden Keefe | Keefe, Patrick Radden | NaN | ="0385545681" | ="9780385545686" | 0 | 4.54 | Doubleday | ... | NaN | 2025/07/10 | to-read | to-read (#298) | to-read | NaN | NaN | NaN | 0 | 0 |
5 | 40163119 | Say Nothing: A True Story of Murder and Memory... | Patrick Radden Keefe | Keefe, Patrick Radden | NaN | ="0385521316" | ="9780385521314" | 0 | 4.47 | Doubleday | ... | NaN | 2025/07/10 | to-read | to-read (#297) | to-read | NaN | NaN | NaN | 0 | 0 |
6 | 42683 | On Writing | Ernest Hemingway | Hemingway, Ernest | Larry W. Phillips, Charles Scribner Jr. | ="0684854295" | ="9780684854298" | 0 | 4.02 | Scribner | ... | NaN | 2025/06/12 | to-read | to-read (#296) | to-read | NaN | NaN | NaN | 0 | 0 |
7 | 22816087 | Seveneves | Neal Stephenson | Stephenson, Neal | NaN | ="" | ="" | 0 | 4.00 | William Morrow | ... | NaN | 2025/06/11 | to-read | to-read (#295) | to-read | NaN | NaN | NaN | 0 | 0 |
8 | 50365 | A Suitable Boy (A Bridge of Leaves, #1) | Vikram Seth | Seth, Vikram | NaN | ="0060786523" | ="9780060786526" | 0 | 4.11 | Harper Perennial Modern Classics | ... | NaN | 2025/06/11 | to-read | to-read (#294) | to-read | NaN | NaN | NaN | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
391 | 28815 | Influence: The Psychology of Persuasion | Robert B. Cialdini | Cialdini, Robert B. | NaN | ="006124189X" | ="9780061241895" | 0 | 4.22 | Harper Business | ... | NaN | 2018/08/27 | to-read | to-read (#5) | to-read | NaN | NaN | NaN | 0 | 0 |
394 | 2255 | Way of the Peaceful Warrior: A Book That Chang... | Dan Millman | Millman, Dan | NaN | ="1932073205" | ="9781932073201" | 0 | 4.13 | HJ Kramer | ... | NaN | 2018/08/27 | to-read | to-read (#4) | to-read | NaN | NaN | NaN | 0 | 0 |
396 | 19795 | Power vs. Force: The Hidden Determinants of Hu... | David R. Hawkins | Hawkins, David R. | NaN | ="1561709336" | ="9781561709335" | 0 | 4.15 | Hay House | ... | NaN | 2018/08/21 | to-read | to-read (#3) | to-read | NaN | NaN | NaN | 0 | 0 |
404 | 566259 | Fire in the Belly: On Being a Man | Sam Keen | Keen, Sam | NaN | ="0553351370" | ="9780553351378" | 0 | 3.81 | Bantam | ... | NaN | 2018/08/21 | to-read | to-read (#2) | to-read | NaN | NaN | NaN | 0 | 0 |
405 | 1052 | The Richest Man in Babylon | George S. Clason | Clason, George S. | NaN | ="0451205367" | ="9780451205360" | 0 | 4.23 | Berkley Books | ... | NaN | 2018/08/21 | to-read | to-read (#1) | to-read | NaN | NaN | NaN | 0 | 0 |
308 rows × 24 columns
df.columns
Index(['Book Id', 'Title', 'Author', 'Author l-f', 'Additional Authors',
'ISBN', 'ISBN13', 'My Rating', 'Average Rating', 'Publisher', 'Binding',
'Number of Pages', 'Year Published', 'Original Publication Year',
'Date Read', 'Date Added', 'Bookshelves', 'Bookshelves with positions',
'Exclusive Shelf', 'My Review', 'Spoiler', 'Private Notes',
'Read Count', 'Owned Copies'],
dtype='object')
# Columns that I want to keep
= ['Title', 'Author', 'Average Rating', 'Publisher',
columns 'Number of Pages', 'Original Publication Year', 'Date Added']
= to_read[columns] to_read
5) to_read.head(
Title | Author | Average Rating | Publisher | Number of Pages | Original Publication Year | Date Added | |
---|---|---|---|---|---|---|---|
4 | Empire of Pain: The Secret History of the Sack... | Patrick Radden Keefe | 4.54 | Doubleday | 535.0 | 2021.0 | 2025/07/10 |
5 | Say Nothing: A True Story of Murder and Memory... | Patrick Radden Keefe | 4.47 | Doubleday | 441.0 | 2018.0 | 2025/07/10 |
6 | On Writing | Ernest Hemingway | 4.02 | Scribner | 160.0 | 1984.0 | 2025/06/12 |
7 | Seveneves | Neal Stephenson | 4.00 | William Morrow | 872.0 | 2015.0 | 2025/06/11 |
8 | A Suitable Boy (A Bridge of Leaves, #1) | Vikram Seth | 4.11 | Harper Perennial Modern Classics | 1474.0 | 1993.0 | 2025/06/11 |
# Remove NaN values
= to_read.dropna() to_read
5) to_read.head(
Title | Author | Average Rating | Publisher | Number of Pages | Original Publication Year | Date Added | |
---|---|---|---|---|---|---|---|
4 | Empire of Pain: The Secret History of the Sack... | Patrick Radden Keefe | 4.54 | Doubleday | 535.0 | 2021.0 | 2025/07/10 |
5 | Say Nothing: A True Story of Murder and Memory... | Patrick Radden Keefe | 4.47 | Doubleday | 441.0 | 2018.0 | 2025/07/10 |
6 | On Writing | Ernest Hemingway | 4.02 | Scribner | 160.0 | 1984.0 | 2025/06/12 |
7 | Seveneves | Neal Stephenson | 4.00 | William Morrow | 872.0 | 2015.0 | 2025/06/11 |
8 | A Suitable Boy (A Bridge of Leaves, #1) | Vikram Seth | 4.11 | Harper Perennial Modern Classics | 1474.0 | 1993.0 | 2025/06/11 |
I notice that some of the columns are type float, I want them to be integers instead
to_read.dtypes
Title object
Author object
Average Rating float64
Publisher object
Number of Pages float64
Original Publication Year float64
Date Added object
dtype: object
= to_read.astype({'Number of Pages': int, 'Original Publication Year': int})
to_read 'Date Added'] = pd.to_datetime(to_read['Date Added']) to_read[
3 Creating random book picker function
import datetime
import random
def random_book(df, options: int = 1, title: str = None, author: str = None, min_rating: float = 0, publisher: str = None, min_year: int = None, max_year: int = None, added_year: int = None, added_month: int = None):
if title is not None:
= df.loc[df['Title'].str.contains(title, case=False)]
df
if author is not None:
= df.loc[df['Author'].str.contains(author, case=False)]
df if df.empty == True:
print("You haven't saved any books that you want to read by that author")
return
if min_rating is not None and min_rating >= df['Average Rating'].min():
= df.loc[df['Average Rating'] >= min_rating]
df
if publisher is not None:
= df.loc[df['Publisher'].str.contains(publisher)]
df
if min_year is not None:
if min_year < df['Original Publication Year'].min():
= df['Original Publication Year'].min()
min_year = df.loc[df['Original Publication Year'] >= min_year]
df
if max_year is not None:
if max_year > df['Original Publication Year'].max():
= df['Original Publication Year'].max()
max_year = df.loc[df['Original Publication Year'] <= max_year]
df
if added_year is not None and (added_year < df['Date Added'].dt.year.min() or added_year > df['Date Added'].dt.year.max()):
= df.loc[df['Date Added'].dt.year == added_year]
df
if added_month is not None:
if (added_month > 12 or added_month < 1):
print('Month out of range, choose a number between 1 and 12')
return
= df.loc[df['Date Added'].dt.month == added_month]
df
# Pick a book for the number of choices wanted
= []
books for i in range(options):
0, len(df)-1))
books.append(random.randint(
return df.iloc[books]
4 Testing
=2030) random_book(to_read, added_month
Month out of range, choose a number between 1 and 12
='japan') random_book(to_read, title
Title | Author | Average Rating | Publisher | Number of Pages | Original Publication Year | Date Added | |
---|---|---|---|---|---|---|---|
50 | Bushido: The Soul of Japan | Inazō Nitobe | 3.84 | Kodansha USA | 160 | 1899 | 2024-04-21 |
=1800, max_year=1940) random_book(to_read, min_year
Title | Author | Average Rating | Publisher | Number of Pages | Original Publication Year | Date Added | |
---|---|---|---|---|---|---|---|
374 | The Brothers Karamazov | Fyodor Dostoevsky | 4.39 | Farrar, Straus and Giroux | 796 | 1880 | 2018-11-10 |
=4.1) random_book(to_read, min_rating
Title | Author | Average Rating | Publisher | Number of Pages | Original Publication Year | Date Added | |
---|---|---|---|---|---|---|---|
64 | My Traitor's Heart: A South African Exile Retu... | Rian Malan | 4.25 | Grove Press | 349 | 1990 | 2023-04-05 |
='Murakami') random_book(to_read, author
You haven't saved any books that you want to read by that author
=3) random_book(to_read, options
Title | Author | Average Rating | Publisher | Number of Pages | Original Publication Year | Date Added | |
---|---|---|---|---|---|---|---|
32 | Utz | Bruce Chatwin | 3.67 | Penguin Publishing Group | 154 | 1988 | 2024-12-29 |
321 | Nine Chains to the Moon | R. Buckminster Fuller | 3.85 | Southern Illinois University Press | 384 | 1963 | 2020-10-19 |
179 | Swann’s Way (In Search of Lost Time, #1) | Marcel Proust | 4.16 | Penguin Classics | 468 | 1913 | 2021-09-10 |
5 Conclusion
There we have it, a simple random book picker.
It however isn’t optimized for speed as I repeatedly re-assign the DataFrame instead of saving all the filters and then using the saved filtered in one filter operation for the dataframe.
Also the amount of parameters is high for the function, could be an option to use *arg and **kwargs instead.
Would additionally have been better if there was an API for one’s own Goodreads library, then I wouldn’t have to download a csv file when new books are added. This was however just a for-fun coding task.
Also I’m lacking a ‘genre’ column, which would be nice to use to filter books by.